Yang LI Jinlin WANG Xuewen ZENG Xiaozhou YE
Montgomery modular multiplication is one of the most efficient algorithms for modular multiplication of large integers. On resource-constraint embedded processors, memory-access operations play an important role as arithmetic operations in the modular multiplication. To improve the efficiency of Montgomery modular multiplication on embedded processors, this paper concentrates on reducing the memory-access operations through adding a few working registers. We first revisit previous popular Montgomery modular multiplication algorithms, and then present improved algorithms for Montgomery modular multiplication and squaring for arbitrary prime fields. The algorithms adopt the general ideas of hybrid multiplication algorithm proposed by Gura and lazy doubling algorithm proposed by Lee. By careful optimization and redesign, we propose novel implementations for Montgomery multiplication and squaring called coarsely integrated product and operand hybrid scanning algorithm (CIPOHS) and coarsely integrated lazy doubling algorithm (CILD). Then, we implement the algorithms on general MIPS64 processor and OCTEON CN6645 processor equipped with specific multiply-add instructions. Experiments show that CIPOHS and CILD offer the best performance both on the general MIPS64 and OCTEON CN6645 processors. But the proposed algorithms have obvious advantages for the processors with specific multiply-add instructions such as OCTEON CN6645. When the modulus is 2048 bits, the CIPOHS and CILD outperform the CIOS algorithm by a factor of 47% and 58%, respectively.
Feng LIU Helin WANG Conggai LI Yanli XU
This letter proposes a scheme for the backward transmission of the propagation-delay based three-user X channel, which is reciprocal to the forward transmission. The given scheme successfully delivers 10 expected messages in 6 time-slots by cyclic interference alignment without loss of degrees of freedom, which supports efficient bidirectional transmission between the two ends of the three-user X channel.
Yiqiang SHENG Jinlin WANG Chaopeng LI Weining QI
In this paper, we propose an undirected model of learning systems, named max-min-degree neural network, to realize centralized-decentralized collaborative computing. The basic idea of the proposal is a max-min-degree constraint which extends a k-degree constraint to improve the communication cost, where k is a user-defined degree of neurons. The max-min-degree constraint is defined such that the degree of each neuron lies between kmin and kmax. Accordingly, the Boltzmann machine is a special case of the proposal with kmin=kmax=n, where n is the full-connected degree of neurons. Evaluations show that the proposal is much better than a state-of-the-art model of deep learning systems with respect to the communication cost. The cost of the above improvement is slower convergent speed with respect to data size, but it does not matter in the case of big data processing.
Yiqiang SHENG Jinlin WANG Haojiang DENG Chaopeng LI
In this paper, we propose a novel architecture for a deep learning system, named k-degree layer-wise network, to realize efficient geo-distributed computing between Cloud and Internet of Things (IoT). The geo-distributed computing extends Cloud to the geographical verge of the network in the neighbor of IoT. The basic ideas of the proposal include a k-degree constraint and a layer-wise constraint. The k-degree constraint is defined such that the degree of each vertex on the h-th layer is exactly k(h) to extend the existing deep belief networks and control the communication cost. The layer-wise constraint is defined such that the layer-wise degrees are monotonically decreasing in positive direction to gradually reduce the dimension of data. We prove the k-degree layer-wise network is sparse, while a typical deep neural network is dense. In an evaluation on the M-distributed MNIST database, the proposal is superior to a state-of-the-art model in terms of communication cost and learning time with scalability.
Jiali YOU Hanxing XUE Yu ZHUO Xin ZHANG Jinlin WANG
Predicting the service performance of Internet applications is important in service selection, especially for video services. In order to design a predictor for forecasting video service performance in third-party application, two famous service providers in China, Iqiyi and Letv, are monitored and analyzed. The study highlights that the measured performance in the observation period is time-series data, and it has strong autocorrelation, which means it is predictable. In order to combine the temporal information and map the measured data to a proper feature space, the authors propose a predictor based on a Conditional Restricted Boltzmann Machine (CRBM), which can capture the potential temporal relationship of the historical information. Meanwhile, the measured data of different sources are combined to enhance the training process, which can enlarge the training size and avoid the over-fit problem. Experiments show that combining the measured results from different resolutions for a video can raise prediction performance, and the CRBM algorithm shows better prediction ability and more stable performance than the baseline algorithms.